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Summary. We develop a novel advanced Particle Markov chain Monte Carlo algorithm that is capable of sam- 

PJ ■ pling from the posterior distribution of non-linear state space models for both the unobserved latent states and 

the unknown model parameters. We apply this novel methodology to five population growth models, including 

_. models with strong and weak Allee effects, and test if it can efficiently sample from the complex likelihood surface 

^ I that is often associated with these models. Utilising real and also synthetically generated data sets we examine 

the extent to which observation noise and process error may frustrate efforts to choose between these models. 

i..^ ^ Our novel algorithm involves an Adaptive Metropolis proposal combined with an SIR Particle MCMC algorithm 

OO ' (AdPMCMC). We show that the AdPMCMC algorithm samples complex, high-dimensional spaces efficiently, and 

P^ , is therefore superior to standard Gibbs or Metropolis Hastings algorithms that are known to converge very slowly 



when applied to the non-linear state space ecological models considered in this paper. Additionally we show 
how the AdPMCMC al gorithm can be used to recursively estimate the Bayesian Cramer-Rao Lower Bound of 



Tichavskv et al.. 199811 . We derive expressions for these Cramer-Rao Bounds and estimate them for the models 
considered. Our results demonstrate a number of important features of common population growth models, most 
notably their multi-modal likelihood surfaces and dependence between the static and dynamic parameters. We 
conclude by sampling from the posterior distribution of each of the models, and use Bayes factors to highlight 
how observation noise significantly diminishes our ability to select among some of the models, particularly those 
that are designed to reproduce an Allee effect. These result have important ramifications for ecologists searching 
for evidence of all forms of density dependence in population time series. 
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1. Introduction 



Ecologists in recent years have begun to fit Bayesian state space models to ecological time series, 



Millar and Mever. 2000 . 



Clark and Biornstad. 2004 IWard. 2006l | . State space models (SSMs) avoid biases 



that can occ ur when either the observations are assumed to be perfect or the process model is t reated deter 



ministically 



Carrol et al.. 1995 



Shenk et al.. 1998 



Calderet al. 2003 . 



Freckletonet al. 2006l |. SSM infer 



ence involves jointly estimating the latent state vector and the static parameter vector of the models for obser- 
vations and states. Many ecological studies that estimate density depei idence using time series data, either do 



Siblv et al- 2005 



Saether et al. 2008 . 



not include jointly es timated observation error or exclude it altogether 

Gregory et al.. 2010| . Studies that do incorporate process error and obser vation noise into ecological inference 



sometimes use very specia l cases of linear state and observation equations Lindley. 2003 , 



tended Kalman Filter 



Williams et al. 2003 . 



Linden and Knape. 2009 [. More reahstic, non-linear models may be implement ed sub-optimally in an Ex- 



Wang. 2007 . 



Zeng et al.. 1998 . 



Welch and Bishop. 1995l |. Linear approximations of 



non-linear state equations jWang. 2007l | can allow for a combination of suboptimal Kalman filtering and 
Gibbs sampling to be performed. Linear approximation, however, can lead to inaccurate parameter poste- 
rior inference for non-linear SSM with non-negligible observation noise and process error. 

Studies that employ the Metropolis -Hastings within Gibbs (MHG) algorithm avoid these restrictive 



linear assumptions or local linearizations, Millar and Mever. 200C . 



Ward. 200fi . 



Clark and Biornstad. 2004 . 



Saether et al.. 2007l |. but the performance of MHG is known to be very se nsitive to the process model and 



observation model parameterisation as discussed in 



Andrieu et al. 2010l. The standard MHG algorithm 



M 



Millar and Mever. 2000J . for two reasons. Firstly, 



can be difficult to tune for non-linear ecological SSMs, 
non-negligible posterior correlations can occur in the high dimensional parameter space that is composed 
of both the latent state process and the static parameters. Secondly, the joint likelihood surfaces of the 
static parameters for popular non-linear populati on growth models (irr espective of observation error) can 



be multi-modal and have strong sharp fiat ridges. 



Polanskv et al.. 2009{ . Furthermore, an MHG algorithm 



that has not sufficiently mixed over modes and ridges will corrupt all model selection and model averaging 
routines. 

In this paper, we describe a novel technical development that is specifically designed to address non- 
linear dynamics in SSMs. The method uses a Pa rticle MCMC algorit hm for joint process and parameter 
estimation in non-linear and non-Gaussian SSMs Andrieu et al.. 2010l |. coupled to an adaptive Metropolis 
proposal which we denote by the Adaptive Particle Markov chain Monte Carlo algorithm (AdPMCMC). We 
apply this novel methodology to synthetic and real data sets. Our objectives are to examine the performance 
of the new algorithm in non-linear population growth models, with complex likelihood surfaces, and to re- 
examine the problems of model selection with Bayes factors estimated via Markov chain samples. We find 
that the novel methodology is capable of efficiently sampling highly non-linear models without careful tuning 
in very high dimensional posterior models. 

Complicated likelihood surfaces with multimodal characteristics present a serious challenge to Bayesian 



posterior samplers. 



Polanskv et al 



■ ■ 2004 



The failure of the basic MHG sampler to mix over the support 



of the posterior in such situations is a known concern in this case for ecological m odellers and in general a 
well known problem in the statist ics literature, see discussions and illustrations in 



Robert. 2001 . 



Carlin and Louis. 1997 



Bishop et al.. 2006l | . Our AdPMCMC methodology will be shown to be capable of efficiently 



exploring the posterior distributions obtained from typical ecological growth models, overcoming many of 
the poor mixing properties of the basic MHG sampler in this context. Additionally, we use Bayes factors to 
highlight the ambiguity in model structure when fitting models to short ecological time series and therefore 
affirm that predictions outside the observed data made by a single best fitting model can be misleading. 
This is especially important when the intention is to capture effects that are difficult to observe, such as the 
Alice effect. It is precisely in estimation of quantities such as Bayes factors, that one requires a sampling 
methodology capable of efficiently delivering samples from the entire support of the posterior distribution. 

1.1. Contribution 

This paper addresses three main concerns abo ut model fitting to ecological time series. The first of these is 



the inclusion of observation error. 



Wang [2007l | , argues that observation error should be included in ecological 



modelling when estimating abundance of animals because measurement errors arise from several sources, for 
example unaccounted effects of weather on the probability of detecting animals or ra ndom failures of trap ping 
devices. Ignoring observation error can lead to biased estimates of the parameters Carrol et al., 1995| . We 



argue it will also have an important effect on model selection and the ability to detect ecological effects such 
as density dependence and the strength of an Alice effect. 

The second issue involves model simplification. It is difficult to jointly estimate the latent process 
with the model parameters, especially when the state or observation process are non-linear and there is 
strong dependence between model parameters and the latent process. The approach we present will allow us 
to work directly with realistic non-linear models in the presence of arbitrarily high process and observation 
noise settings. Moreover, we do not introduce any approximation e rror as we ha ve no need to linearise such 
models to perform filtering, such as in the approaches discussed in jWang. 2007 1. 

The third issue relates to efficient estimation and robust model selection. To date relatively sim- 
)le Metropolis-Hastings and Gibbs sampling routines have been used to fit ecologi cal state space models 



Millar and Mever. 2000llClark and Biornstad. 2004 . 



Ward. 200fi . 



Sasther et al.. 2007l | . These algorithms are 



prone to convergence difficulties for two reasons: the first is due to non-negligble posterior correlations that 
will occur in the high dimensional parameter space that is composed of both the latent state process and the 
static parameters; the second is due to posterior multimodality. The AdPMCMC we propose is specifically 
designed for joint process and parameter estimation in non-linear and non-Gaussian states space models. 

1.2. Structure and Notation 

The paper is structured as follows. In Section 2, we present five common population growth models, including 
four that are non-linear in the states, parameters or bot h. Two of the se models include weak or strong Allee 
effects which can be important in ecological populations 



Dennis. 2002 



In Section 3, we summarise a number 



of issues regarding Bayesian estimation, prior selection and the likelihood surface that are important and 



relevant to the issues identified above. Section 4 presents the details of the estimation and model selection 
procedure, including the AdPMCMC algorithm. Section 5 presents the results and discussion which is split 
into two parts. The first part considers results for synthetic data and the performance of the algorithm. The 
second part presents the results for two real data sets under the newly developed sampling methodology. 
The paper concludes with brief discussion and recommendations for future developments including more 
sophisticated adaption routines and non-Gaussian error structures. 

The notation used throughout this paper will involve; capitals to denote random variables and lower 
case to denote realizations of random variables; bold face to denote vectors and non-bold for scalars; sub- 
script will denote discrete time, where ui-^t denotes ni, . . . ,nT- In the sampling methodology combining 
MCMC and particle filtering (Sequential Monte Carlo - SMC), we use the notation [■]{j, i) to denote the j-th 
state of the Markov chain for the i-th particle in the particle filter. Note, the index i is dropped for quantities 
not involving the particle filter. We will define a proposed new state prior to acceptance at iteration j of the 
Markov chain by [•](j)'. In addition, we note that 9 generically represents the particular model parameters 
under consideration, for example Mq has 6 — (bo, a^, aw) and M4 will have 9 — (65, 6g, 67, , ctj, ctu,). 

2. Ecological State Space Models 

This section presents the ecological state space models considered, with discussion from the ecological mod- 
elling perspective, regarding density dependence and weak or strong AUee effects. 

2.1. Observation equations 

The generic observation equation we consider is, 

yt = gt [nt) + wt (1) 

where wt ^ N{0,aw)- The observation model acknowledges that we typically do not observe the entire 
population of interest, but rather a sample realization and that the observation mechanisms we use are 
imperfect. It therefore refiects all sources of variability introduced by the data-generating mechanism. The 
Gaussian observation err or assumption for log tran sformed abundances is common and widely applied in 



ecological contexts, e.g. 



Clark and Biornstad. 2004l |. Importantly, this assum ption can be easily extended 



to include more flexible classes of observation noise, such as a- stable models 



Peters et al.. 2009l | or others 



Stenseth et al.. 2003L IWard. 2006l | , under the methodological 



recently proposed in the ecological literature 
framework that we develop in this paper. 



2.2. Process equations 

We consider five different stochastic population dynamic models that encapsulate different ecological effects 
such as density dependendent mortality and strong and weak Allee effects. Each of these process models 
describes how the number of individuals in a population affect the subsequent growth of the population. In 
these models Nt represents a continous latent random variable for the population size. The process error in 



these models (e^ ~ N{0, a^)) reflects all sources of variability in the underlying population growth process 
that is not captured by t he model. P rocess error in all of the models is assumed to behave multiplicatively 
on the natural scale, see Clark. 2007 1. 



2.2.1. The exponential growth equation (Mq) 

The exponential growth model provides a simple density independent model for comparison with potentially 
more realistic density dependent models. The discrete, exponential growth log transformed equation is 

log Nt+i = log Nt + bo + et. (2) 

where bQ = r is the maximum per-individual growth rate (defined as the difference between the per individual 
birth and death rates). 

2.2.2. The Richer equation (Mi) 

The discrete Ricker equation is a density dependent model that is similar to the familiar logistic growth 
model. After log transformation it is given by, 

log Nt+i = log Nt + bo + biNt + e*, (3) 

where b^ is the density-independent growth rate and bi governs the strength of density dependence. For 
populations that exhibit negative density dependence (6i < 0), the "carrying capacity" of the environment 
(usually denoted K) is defined by the stable equilibrum at — jS. so long as the density independent growth 
rate is positive (6o > 0). If 6o < then the only stable equilibrium is located at 0. Whilst linear in its 
parameters, this model is non- linear in the latent state because it contains an exponential term in Nt- 



2.2.3. The theta-l ogistic equation (M2) 
Lande et al.. 2003| recommend the theta-logistic equation when the form of the density dependence in the 



population dynamics is unknown, given here on the log scale 

\ogNt+i = logNt + bo + b2 {Ntf' + et, (4) 

where &3 determines the form of density dependence. For the carrying capacity {K = ( ^5^ ) ^ ) to exist, 5o 
and 62 must be of opposing sign, and it is stable only if bo and 63 are of the same sign. 



2.2.4. The "mate-hmited" logistic equation (M3) 



Morris and Doak. 2002J propose a "mate-limited" Allee effect equation, presented here under a log trans- 
formation, 

log Nt+i = 2 log Nt - log (64 + Nt) + 60 + biNt + et, (5) 



where 64 > represents the population size at which the per-individual birth rate is half of what it would 
be if mating w as not limited, thereby controling the population size at which AUee effects are "noticeable" . 



Dennis. 2002j notes that mate-limited models can describe, in a phenomenological fashion, AUee effects from 



other biological mechanisms besides mate limitation. 

2.2.5. The "Flexible- AUee" logistic equation (A'U) 

This modified version of the Ricker model allows for both strong and weak AUee effects, 

log Nt+i = log iVt + 65 + beNt + byNf + tt , (6) 

i-be-Jbl-ibr.b-') {-be + Jbl-Ab^bj') 

where K = -^^ 2h ^ ^'^'^ ^ = "^ 2fc ^^ these roots are real, then C represents the 

threshold of population size below which per capita population growth is negative. In the deterministic 
model, when Q < C < K , the AUee effect is strong and it is possible for an unstable equilibrium at iV = C 
to occur between two stable equilibria, A^ = and N = K. When C < 0, the AUee effect is weak and a 
single stable equilibria occurs &t N = K . If C and K are not real, then A^ = is the only stable equilibrium. 
This model is a discr ete version of a continuous tim e model previously used i n theoretical studies of spatial 



population dynamics 



Lewis and Kareiva. 1993l |: see 



Boukal and Berec. 2002 



for further discussion of AUee 



effects in discrete time population models. Note that model M4 differs from model M3 in that the latter can 
only exhibit a strong AUee effect. 

The process models Afo to Af4 are nested models. Specifically, setting 67 = in model A/4 and 64 = 
in model A/3 returns model A/i . Setting 63 = 1 in model A/2 also returns model A/i . Setting 61 = in model 
Ml returns the density independent model A/q. 

3. Bayesian Estimation 

Our approach estimates the latent process and model parameters for models A/q through to A/4 using 
advanced Bayesian inference. The Markov stochastic process Nq.t is unobserved (latent) and must be 
estimated over the time interval [0,r] at discrete time points t = through to T. In this setting we are 
interested in estimating the entire data set as opposed to a sequential, real-time estimates of the latent 
states jointly with the model parameters. As such the posterior of interest to this problem is given by 
p(0, ni-T\yi:T, Mi), where we have absorbed the initial state value (A'o) into the vector of static parameters 
6. 

3. 1. Priors, Likelihood and Posterior 

Table [T] summarises the priors selected for the process and observation model parameters. The process model 



parameters &0j b 



^2 1 ^3 1 ^5 1 ^6 and &7 are given Gaussian distribution priors. The prior for 64 is assumed to 



be nonnegative Morris and Doak. 2002J and is given a Gamma prior distribution. In addition, we give the 

10 



noise variances a^ , a^, inverse gamma priors with the parameters a^ — aw = -^ and f3e — (3w ^ -^ 



Table 1. Priors for the parameters. 




Parameters 


Priors 




&0, 1,2, 3, 5, 6, 7 


iV(0,l) 




bi 


G [shape = 1, scale = 


= 10) 


ol 


/G(a„/?.) 




-I 


IG{a^,M 





We also assume that observations are independent and identically distributed. Hence, the generic 
likelihood is given by 

T 

C{ni.,T,6;yi.,T) ='Wp{yt\nt,(jl,) , (7) 

t=i 

where p(j/t|nt,CT^) is a Gaussian distribution with mean rit and miknown variance cr^. We assume that 

the observations are conditionally independent given the latent state, with latent states under a first order 

Markov dependence. This produces a target posterior distribution given by 

T 

p{0,ni;T\yi:T) (yip{0)W_p{yt\nt,0)p{nt\nt-i,0) . (8) 

4=1 



3.2. Estimation and Model Selection 

Having specified the posterior distribution, inference and model selection proceeds by estimating popular 
Bayesian statistics such as the Minimum Mean Square Error (MMSE = posterior mean) and evaluating the 
posterior evidence for each model. In our context this requires estimates of the following quantities 



MMSE: (n^^^^^'^^,&^'^^se\ ^ f q f Ni.,TPie,ni.,T\yi..T,M,) dN^.r d&, 
Posterior evidence Mf. p{yi:T\Mi) — / p{yi:T\6 , n^T , Mi) p{6 , nuxlMi) dN^x d6 . 



(9) 



Estimating these quantities for non-linear state space models presents a considerable statistical challenge. 
The dimension of the posterior distribution is dim(0) + T, hence for moderate T (w 50 to 100) the dimension 
is large. Controlling the variance of these Bayesian estimators requires efficient posterior sampling methods 
from the posterior d istribution, which we achieve via our adaptive version of the PMCMC algorithm of 
Andrieu et al.. 20ld |. 



3.3. Adaptive MCMC within Particle MCMC (AdPMCMC). 

The aim of this section is to present a novel methodology to sample from the posterior distribution given in 
Equation dHt, based o n a version of a recent sampler specifically developed for use in state space models; see 
Andrieu et al.. 2010J . The methodology is known as PMCMC, and it represents a state of the art sampling 



framework for state space problems. These samples can then be used to form Monte Carlo estimates for 
Equations ([9]). The fundamental innovation we present in this paper is to combine an Adaptive MCMC 
algorithm within the PMCMC framework. 



7 



The key advantage of the PMCMC algorithm is that it aUows one to jointly update the entire set of 
posterior parameters (0, A^i:t) and only requires calculation of the marginal acceptance probability in the 
Metropolis-Hastings algorithm. PMCMC achieves this by embedding a particle filter estimate of the optimal 
proposal distribution for the latent process into the MCMC algorithm. This allows the Markov chain to 
mix efficiently in the high dimensional posterior parameter space because the particle filter approximation 
of the optimal proposal distribution in the MCMC algorithm, thereby allowing high-dimensional parameter 
block updates even in the presence of strong posterior parameter dependence. The models considered in this 
paper for example have between three and seven static parameters to be estimated jointly with the latent 
state space, resulting in an additional T parameters, where T can range from the tens to the hundreds, or 
even thousands, depending on the data set considered. 

In the state space setting, the Particle MCMC algorithm used to sample from the target distribution 
p {6, A^i:t|2/1:t) proceeds by mimicking the marginal Metropolis-Hastings algorithm in which the acceptance 
probability is given by, 

nn y iw« yy. ■ (, P ([g] (jT |yi:T) g ([gJO^lg] (j ^ 1)) \ 

a ( 6», ni:T - 1 ' ^' ^Ji:T U = mm 1, . ,,„.,. ,^ rau ■v^ ' 

V p{w\{j - ^)\yi:T)q{[o\{3 - 1), m{jy)j 

where q{[0]{j — 1), [9]{i)) is the proposal distribution of the PMCMC generated Markov chain for the static 
parameters to propose a move from state [ff\{j — 1) at iteration j — 1 to a new state [d]{j) at iteration j. 

To achieve this we split the standard Metropolis Hastings proposal dis tribution into two components . 
The first constructs a proposa l kernel via an adaptive Metropolis scheme ( 



Roberts and Rosenthal 



■ 2000! ■ 



Atchade and Rosenthal. 200a |) that is used to sample the static parameters 0. Introducing an adaptive 
MCMC proposal kernel into the Particle MCMC setting allows the Markov chain proposal distribution to 
adaptively learn the regions of the marginal posterior distribution of the static model parameters that have 
the most mass. This significantly improves the acceptance probability of the proposal distribution and 
enables much more rapid and efficient mixing of the Markov chain for a small number of particles L and 
a simple particle filter algorithm. The second component of the proposal kernel constructs an estimate 
of the posterior distribution of the latent states, Ni-t, allowing us to sample a proposed trajectory. This 
proposal kernel is con s tructed via a Sequential Mon te Carlo algorithm that is based on a simple SIR filter, 
f [Gordon et al.. IQQaj . jDoucet and Johansen. 20od |). Note the SIR fiher is suitable for PMCMC applica- 



tions since it is only used as a proposal distribution and n ot as an empirical estimate of the posterior as 



is more common; see discussion of standard application in Doucet and Johansen. 2009l |. In particular the 
proposal kernel approximates the optimal choice 

q{[e,n,..T]{j - 1), [e,m..T]uy) ^qmu - 1), [^Kj)')? (K:T](jr lz/i:T, [eiOT) , (10) 

via an adaptive MCMC proposal for q {[8]{j — 1), [0](j)') and a particle filter (SMC) estimate for 
P (['^i:T](j)'|yi:T, [^](j)')- The SMC algorithm proposal samples (approximately) from the sequence of distri- 
butions, ^p(ni;t\yi;t 6)}t=i;T- FoT a recent revie w of SMC methodology, of which there are several different 



methodologies, see 



Doucet and Johansen. 200 



I 



When this proposal is substituted into the standard Metropohs Hastings acceptance probabihty, sev- 
eral terms cancel to produce, 

a ([0, n,rKj - 1), [6, n,rKj) ) = m:n [l, ^ ^^^,^^^e]ij ^ 1)) p mU - I)) , mU - imUY) 
We can now detail a generic version of the AdPMCMC methodology we developed. 

3.3.1. Generic Particle MCMC 

One iteration of the generic AdPMCMC algorithm proceeds as follows: 

(a) Sample [0](j)' -- q i[0]{j - 1), •) from an Adaptive MCMC proposal. (Appendix 1. Algorithm 2). 

(b) Run an SMC algorithm with L particles to obtain: 

L 

pini..T\yi:T, [dKj)') = Xl^i:T'^["i:r]0»' ("i^t) 

'=' (11) 



Piyl■.TmiJy) = f[lJY,wt{[ntU^y) 



t=l \ i=l / 

Then sample a candidate path [A^i:t](j)' '^ P [ni;T\yi:T ^ [^](j)')- (see Appendix 1. Algorithm 1) 
(c) Accept the proposed new Markov chain state comprised of [0, iVi:T](j)' with acceptance probability 
given by 

(\R ir iwfl M-y\ rr.- f^ piyi.T\muy)pmuy)imuy,mu-i)) \ i,^. 

where P {yi:T\[d]{j — 1)) is obtained from the previous iteration of the PMCMC algorithm. 

The key advantage of this approach is that the difficult problem of designing high dimensional pro- 
posals has been replaced with the simpler problem of designing low dimensional mutation kernels in the 
Sequential Monte Carlo algorithm embedded in the MCMC algorithm. This sampling approach in which 
Sequential Monte Carlo is used to approximate the marginal likelihood in the acceptance probability has 
been shown to have several theoretical convergence properties. In particular the empirical law of the parti- 
cles converges to the true filteri ng distribution at ea ch iteration as a bounded linear function of time t and 



the number of particles L; see 



- 201oi 



Andrieu et al.. 2010l |. This means is is possible to construct and efficiently 



sample approximately optimal path space proposals with linear cost. 

3.3.2. Generic Adaptive MCMC 

We utilise the adaptive MCMC algorithm to learn the proposal distributi on for the static parameters in our 



Andrieu and Atchade. 20061 1. The 



posterior 0. There are several classes of adaptive MCMC algorithms; see 
distinguishing feature of adaptive MCMC algorithms, (compared to standard MCMC), is that the Markov 
chain is generated via a sequence of transition kernels. Adaptive algorithms utilise a combination of time 
or state inhomogeneous proposal kernels. Each proposal in the sequence is allowed to depend on the past 
history of the Markov chain generated, resulting in many possible variants. 



When using inhomogeneous Markov kernels it is particularly important to ensure the generated 
Markov chain is ergodic, with the appropriate stationary distribution. Several recent papers proposing 
theoretical conditions that ni ust b e satisfied to ensure ergodicity of a daptive algorithms include, 



Andrieu and Moulines. 2006J and Haario et al.. 2005! |. In particular Roberts and Rosenthal. 2009l | proved 



ergodicity of adaptive MCMC under conditions known as Diminishing Adaptation and Bounded Convergence 

It is non-trivial to develop adaption schemes that are easily verified to satisfy these two conditions 

In this paper use a mixture proposal kernel known to satisfy these two ergodicit y conditions when unbounded 



state spaces and general classes of target posterior distributions are utilised; see Roberts and Rosenthal. 2009 1. 



3.4. Adaptive Metropolis within SIR Particle MCMC (AdPMCMC). 

In this section (and Appendix 1) we present the details of the Adaptive Metropolis within Particle MCMC 
(AdPMCMC) algorithm used to sample from the posterior on the path space of our latent states and model 
parameters. This involves specifying the details of the proposal distribution in Equation 1101 The proposal, 
q([0]{j — 1), [9]{jy), involves an adaptive Metropolis proposal comprised of a mixture of Gaussians, one 
component of which has a covariance structure that is adaptively learnt on-line as the algorithm progressively 
explores the posterior distribution. The mixture proposal distribution for parameters 6 is given at iteration 
j of the Markov chain by, 

q, mU - 1), •) - ^iN U [9]{j - 1), ^^S,) + (1 - w^)nU [e]{j - 1), ^^IdA ■ (13) 

Here, S^ is the current empirical estimate of the covariance between the parameters of 6 estimated using 
samples from the Particle Markov chain up to tin ie ?'. The theoretical motivatio n for the choices of scale fac- 
tors 2.38, 0.1 and dimen sion d are all provided in 



conditions presented in 



Roberts and Rosenthal. 2001 



M 



Roberts and Rosenthal. 2009l | and are based on optimality 



The proposal kernel for the latent states Ni,t, given by p{[ni;T]{jY\yi:T, [^](j)'): uses the simplest 
SMC algorithm known as the SIR algorithm. Therefore the mutation kernel, is given by the process model, 
in our case N {Nt; ft (A^t-i, 6*) , a^) . 

4. Results and Analysis 

This section is split into four subsections, the first two subsections study the performance and estimation 
accuracy of the AdPMCMC algorithm using synthetic data generated from models (Mq, . . . , M4) with known 
parameters. We gauge the accuracy of the algorithm by comparing the marginalised average estimated mean 
square error (MSE) of the MMSE estimate of the latent state process Ni-t (estimated over 20 blocks per data 
set with 20 data sets for the average) to a recursively defined Bayesian Cramer- Rao Lower Bound (BCRLB). 
In the third subsection, we study the performance of Bayes factor estimates for each of the models using 
evidence estimates derived from Markov chain samples obtained via the AdPMCMC algorithm. In the final 
sub-section, we perform parameter estimation and model selection using two real data sets. 

10 



Each iteration of the AdPMCMC algorithm an approximation to the optimal proposal is constructed 
and sampled to produce a proposed Markov chain state update in dimension dim (0) + T. Additionally, all 
simulation studies in this paper involved generation of a Markov chain via the AdPMCMC algorithm with 
the following three stages: 

stage 1: ANNEALED PMCMC - This stage involves a random initialization of the posterior parameters {9, Ni-t) 
followed by an annealed AdPMCMC algorithm, using a sequence of distributions 

Pn{e, Ni.,T\yi:T) = P{0, Ni.,T)^-^"p{yi:T\e, Ni.,tV\ 

where 7„ increases linearly from 10^^ to 1 for the first 5,000 AdPMCMC steps. The annealing must 
also be integrated into the construction of the particle filter proprosal, and impacts directly on the 
estimate of the marginal likelihood for 6.\ 

stage 2: NON ADAPTIVE PMCMC - This stage involves a burn-in chain of 5,000 iterations in which a non- 
adaptive proposal is utilised for the static parameters 6 based on the non-adaptive mixture component 
in proposal Equation ([T5|) and the SIR particle filter for the latent process states. These burn-in 
samples are used to form an initial estimate of the covariance matrix for the first iteration of the 
adaptive MCMC proposal; 

stage 3: ADAPTIVE AdPMCMC - This stage involves generating samples using the SIR particle filter proposal 
and the mixture adaptive Metropolis proposal Equation (IT^ for 50,000 samples that are subsequently 
used in estimating model parameters. 

In addition to these burnin stages one could utilise a combination of tempering throughout the PM- 
CMC chain. However, we note that when performing the tempering stage on the marginal likelihood, one 
should be careful to avoid bias in the acceptance probability. One way to overcome this problem and to 
still maintain all the Markov chain samples would be to perform post processing of the tempered samples 



with temperature not equal to one, under the Importance Tempering framework of Gramacv et al.. 201C 
within a tempered PMCMC algorithm. It should also be noted, that if performing this additional tem- 
pering, one must be very careful about the inclusion of adaptive MCMC for the static parameters and the 
conditions of bounded convergence and diminishing adaptation should be verified. Finally, we point out 
that adaption of the mutation kernel in the SMC component of the algorithm can be performed arbitrarily 
wi thout any constraint on the adaption rate and one possible approach to consider involves the methodology 
of jCornebise et al. 2008 1. 



4. 1. Simulation Study - Syntlietic Data M2 - Tlieta Logistic. 

Here we study the performance of the AdPMCMC algorithm in the context of a challenging non-linear 
process model: the theta- logistic (M2). We randomly generate data sets from the model, and we show that 
the AdPMCMC sampling methodology produces a Markov chain that mixes efficiently and can accurately 
obtain estimates of Equation ([9]), even in the presence of significant observation noise. 
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The parameters used for the synthetic studies in this first subsection are (60 = 0.15, 62 = —0.125, 63 = 
0.1,{K = 6.2,6* = 0.1), (T.^ = 0.39, tJe = 0.47, no = 1.27). We produce T = 50 observations using the 
simulated values of the latent process. We begin the analysis of the AdPMCMC algorithm for a range of 
particle numbers L S {20, 50, 100, 200, 500, 1000, 5000}. Figure [1] shows the average acceptance probability 
of the AdPMCMC algorithm, which proposes updates of the entire Markov chain state (iVi^^, d) in 56 
dimensions, at each iteration of the AdPMCMC sampler, as a function of L. As expected, increasing the 
number of particles results in an improved estimate of the optimal proposal distribution p{ni:T\yi:T, S') in 
Equation ()10p and a lower variance estimate for marginal likelihood p{yi;T\0') in Equation ()11|) . ultimately 
improving the acceptance probability. The results demonstrate that it is reasonable to perform simulations 
with L = 500 particles which produces efficient mixing in the AdPMCMC sampler. 
The bottom panel of Figure [2] presents the autocorrelation function (ACF) for ct^ for different numbers of 
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Fig. 1 . Average acceptance probabilities of the AdPIVICIVIC algorithm proposing updates of the MCMC in 56 dimensional 
space at each iteration, averaged over 50,000 Markov chain samples post burn-in, as a function of the number of 
particles L. 



particles L. The top panel plots the Geweke Z-score time series diagnostic JGeweke et al.. 1992j | for the a^ 
parameter as a function of L. The settings for this diagnostic considered the difference between the means of 
the first 10% and the last 50%. This is calculated for the Markov chain at lengths of t G {5k, lOfc, . . . , 50fc}. 
We note that technically this diagnostic is derived for a non-adaptive Markov chain proposal. We apply this 
result here as a guide to convergence arguing it still produces informative results once the rate of adaption 
slows down, because the covariance structure used in the proposal becomes close to constant. This occurs 
once the chain has mixed sufficiently over the posterior support. 

Figure |3] shows the trace plots of the Markov chain sample paths for the static parameters (j'^,a'^, 
Nq, 60, &2, ^3. The first 5,000 samples demonstrate that the annealing stage can handle initializations of the 
Markov chain far from the true parameter values used to generate the data. The second stage, from samples 
5,000 to 10,000, demonstrates the slow mixing performance of the untuned non-adaptive chain. Finally the 
samples from 10,000 to 60,000 clearly show the significant improvement of including an adaption stage in 
the AdPMCMC algorithm. 

The top panel of Figure |4] presents the true generated latent process N^.^P^ and the observations. 
The bottom panel shows a comparison of the N'^.jP^ versus boxplots of each state element Ni obtained 
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Fig. 2. Diagnostics for the AdPiVICiVIC algorithm calculated post burn-in over 50,000 l\/larkov chain samples. TOP 
SUBPLOT: Geweke Z-score statistics for noise a^ versus L. BOTTOM SUBPLOT: ACF function of the noise ct„ versus 
the number of particles L. 
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Fig. 3. Trace plots of the sample paths for the marginal IVIarkov chain parameters o-^, al,,No, bo, &2, 63 based on L = 
5000 with STAGE 1 : the first 5,000 samples from the annealed stage (to the left of the vertical dotted line); STAGE 2: 
the samples from 5,001 to 1 0,000 involve the non-adaptive PMCMC stage (between the dotted and dashed lines) and; 
STAGE 3: the samples from 10,001 to 60,000. 

using 50,000 post burn-in samples from the AdPMCMC algorithm. We also show the estimated path space 
MMSE, N^j^^^ and the 95% posterior predictive intervals shaded around this MMSE estimate. This plot 
demonstrates that the MMSE estimate is accurate. 
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Fig. 4. TOP PANEL: The true latent process (solid line) and the observations (dashed line). BOTTOM PANEL: Box 
plots for the path space estimates {[Ni:t]{j)}j=i obtained from the PMCMC algorithm. In addition, we present the 
estimated NffT^^"^^ as a solid line and the grey colouring presents the 95% posterior predictive intervals for each Nt. 



Figure [5] demonstrates, for L = 5,000 particles, scatter plots of the pairwise marginal distributions 
for each static parameter (lower triangular region of the matrix plot), the kernel density estimate of the 
marginal posterior distribution for each parameter (diagonal of the matrix plot) and the estimated posterior 
correlation coefficient between the posterior parameters (upper triangular matrix plot). 

Figure [S] shows, for L = 5,000 particles, a heat map for the estimated posterior correlation matrix 
obtained using samples from the AdPMCMC for the path space and static parameters, p{9,Tii;T\yi:T)- 
4.2. Synthetic Data - Mean Square Error (MSE) Analysis 

In this section we study the MSE estimates of the latent process Ni-t, after integrating out the uncertainty 
in the static parameters 6, for a range of SMC particle counts L. This will illustrate the accuracy of the 
AdPMCMC estimates of the true underlying process for a given signal to noise ratio. We also derive a 
recursive expression for the Bayesian Cramer-Rao Lower Bound (BCRLB) for each model Mo to M4 as a 
lower bound comparison. We demonstrate how, for a set of D data sets each of length T, the BCRLB can 
be trivially estimated at no additional computational cost in our mod el framework, recursive ly for each time 
step t, via the AdPMCMC algorithm and a modified recursion from [Tichavskv et al.. 1998J . 
Cramer-Rao Lower Bound for the Path Space Proposal in PMCMC 

To estimate the Cramer-Rao Lower Bound in the Bayesian context we do not require that the estimator of 
interest, in our case N^ij^^^-^ , be unbiased. However we do require that the model is specified such that the 
following two conditions hold. 

Condition 1: for each of the static model parameters 0^*-' G [a^*-', &^*''], the prior model piO'^^^) satisfies that 
lime(,)^,w p{e^'^) -> and lime(.,^fo(.) p{e^'^) -> 
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Fig. 5. Scatter plots of the posterior distributions for tfie static parameters 0, this plot also contains the smoothed es- 
timated marginal posterior distributions for each static parameter, followed by the estimated linear posterior correlation 
coefficient between each static parameter. 
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Fig. 6. Heat map in grey scale for the posterior correlation matrix between static parameters and the path space, Ni-.t, 
given in the first 50 rows and columns, followed by the state parameters in the last six. 

Condition 2: The following smoothness properties of the likelihood hold 6'^^\ 

dfe{.){yi:T) 



de^) 



-dyi:T = 



Under these conditions we may use the results of 



Tichavskv et al.. 1998l | in which recursive expressions for 



the BCRLB are derived for general non-linear state space models. We modify these results to integrate out 
the posterior uncertainty in the joint estimates of the static parameters 6. We derive results that perform 
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this marginalization numerically utilizing the existing AdPMCMC framework for each data set. 

In particular we estimate for models Mq through to M^ the following mean square error (mse), 
recursively in time t, 



Nl:T - Nl.,T 



^p{ni:T,yi:T\6) 



Nl:T - Nl.,T 

Ni:T - Nut 



P {ni:T, yi:T, 0) dni.,Tdyi:Tdd 



Nl:T - Nl:q 



(14) 



Pio)de, 



where in this paper we focus on the MMSE estimator Nt = TV*^*^"^^. The BCRLB provides a lower bound 
on the MSE used to estimate the path space parameters which correspond in our model to the estimation of 
the latent process states Ni-t- We denote the Fisher Information Matrix (FIM), used in the BCRLB, on the 
path space by [Ji:t (Ni-t)] (j) and marginally by [Jt (Nt)] (j) for time t, conditional on the proposed static 
parameters at iteration j of the PMCMC algorithm. Here we derive an analytic recursive expression for this 
quantity. For Mq we can get an analytic solution whereas for the other models we will resort to AdPMCMC 
based online approximations with a novel estimation method based on the particle filter proposal distribution 
of our AdPMCMC algorithm for each data set. 

Conditional on the previous Markov chain state [9,ni:T] (j — 1) and the new sampled Markov chain 
proposal for the static parameters at iteration 7, \6] (j), we obtain the following modified recursive expression 
for the FIM based on Eq. (21) in [Tichavskv et al. 1998l| : 

'mn,)] (j) = [D^^.m] (j) - [D^'-ANt)] (j) {[jt-im] (j) + [Di\m] u))'' [Di\m] (j), (15) 

where we obtain the fo llowing matrix decompositions of our system model, via Eqs. (34-36) of 



Tichavskv et al.. 1998| under the model assumptions of additive Gaussian process and observation noise 



(note derivatives here are taken under the log transformed models for Nt as specified in the additive Gaussian 
error SSMs in Section [ 



[joiNt)] (j) = -E [Viogivo {Viogivo logp(iVo)}' 

[Dl'.,] (j) = -E [Viog^,_i {Viogiv._i log p{Nt\Nt-i)f] = E{[Viog^,_i/(7V,_i;0)] Q;\ [Viogiv._i/ (iVt-i; 0)]^} 

[Dl'-i] U) = [A'-i] (j) = -E [Viogiv, {ViosN,_, logp{Nt\Nt-i)f] = -E [Viog^,_,/ (iV^-i; 0)] Qr_\; 

[A'-i] (j) = -E [Viogiv, {Viogiv. logp(7Vt|iVt_i)}^] + -E [Viogiv. {Vi^giv, logp{yt\Nt)f] 

= QT-i + E { [Viog ^, h [Nt ■ e)] Ri ' [Viog N,h{Nt;0)f} 

where / (iVt_i; 6) is the state model with process noise covariance Qt and h {Nf, 9) is the observation model 
with observation noise covariance Rf. Next we derive these quantities for each model, summarised in Table 



[2] where the expectations terms Jo(-/Vo) (j) = 



(j) are common to all 



(j) and [Dfl,] (j) = 
models and evaluate as shown for iteration j of the AdPMCMC algorithm. 

Remark 1: The key point about utilising this recursive evaluation for the FIM matrix is that in the 
majority of cases one can not evaluate the required expectations in Table [^ analytically. However, since we 
are constructing a particle filter proposal distribution for the AdPMCMC algorithm to target the filtering 
distribution p {nt\yi:T , [9]U)) ^c can use this particle estimate to evaluate the expectations at each iteration 
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Table 2. Derivation of the BCRLB for eacli model. 



Model 



[A^ii] (j) 



[Dll,] U) 



Mo 
Ml 
Ma 

Ms 
M4 



E[(l + b263exp(63iVt-i))'^] 



E (1 + be exp (A^t-i) + 267 exp {2Nt-i)) 



2 1 



-E 



2- 



-E[(l + exp(iVi_i))Jj] 
[(1 + 62^3 exp (baA^t-i))^ 

cxp(iVt-i^ 



(1 + be exp (ATt-i) + 267 exp {2Nt-i)) -k 



t. It is important to note that this recursion avoids calculating the expectations using the entire empirical 
estimate of the path space, and only requires the marginal filter density estimates for each data set, which 
will not suffer from degeneracy as a path space emprical estimate would. Hence, for example we approximate 



Eti m(l + exp(7V,_i))^ a*) 



the expectation at each time recursion t with E (1 + exp(iVt_i)) 4j 

using the current Markov chain realisation of the parameters [0]{i) and the particle estimate [Nt-i\{j,i) for 

the i-th particle at iteration j of the AdPMCMC algorithm, for each data set. 

Remark 2: We can estimate accurately the BCRLB at each stage of the filter whilst simultaneously in- 
tegrating out the static parameters 6. In addition we note that for the Exponential model (Mq), the BC RLB is 
analytic and optimal in this recursion and is given by the classic information filter, see \Harvev. 199 A ] . That 
is, in the AdPMCMC algorithm for Mq, the particle filtering proposal can be replaced via Rao-B lackwellization 
with the Kalnian filter to obtain the marginal likelihood evaluations Carter and Kohn, 199 A] for this special 
case of the linear Gaussian system. 

Table [3] presents simulation results for models Mq through to M^^. In particular using the estimates 
obtained by the AdPMCMC algorithm for the Bayesian MMSE estimates in Equation (j9]) for the latent 
process -/Vi:t we estimate the following mean square error (mse) quantity for each model 



1 



T 



T / 5I^P(«i 



t=i 




MMSE 



MMSE 



TRUE 



N, 



MMSE 



TRUE 



e\p{e)de 



(j) 



TRUE 



N, 



MMSE 



(j) 



TRUE 



mij) 



(16) 



1 Y.[W,N,U ^) - „™^^ 1 Y.\^tN,U ^) " "P^^ 



mu) 



where 6 denotes the static parameters. We then obtain an average of this estimate over D — 20 independent 
data realizations and report this as the average MSE for the latent state process estimate for each model and 
in (brackets) the standard deviation of the MSE estimate is reported. We then compare these average MSE 
estimates to the BCRLB estimates derived above to provide a performance comparison of our methodology 
as a function of the number of particles, L. 

The results are presented here for each model with L e {20,100,500} particles, T — 50 and 50,000 
iterations post burn- in ( J = 50, 000). We report the results for 20 blocks per AdPMCMC chain in calculation 
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Table 3. Average RMSE for 


jyA/A/sB fQi- 20 independent data realizations (T = 50). 




Average 


Estimated Root Mean Square 


Error (20 blocks, J=50,000) 




L - number of particles 


Mo 


Ml 


M2 


Ms 


M4, 


20 




0.31 (0.03) 


0.32 (0.04) 


0.31 (0.03) 


0.33 (0.03) 


0.32 (0.03) 


100 




0.31 (0.04) 


0.31 (0.03) 


0.30 (0.03) 


0.31 (0.04) 


0.32 (0.04) 


500 




0.31 (0.03) 


0.32 (0.04) 


0.31 (0.04) 


0.31 (0.04) 


0.30 (0.04) 






Averag 


;e Estimated BCRLB 






500 




0.34 (0.01) 


0.38 (0.01) 


0.34 (0.01) 


0.33 (0.01) 


0.35(0.02) 



Table 4. Average RIVISE for Nl.f,^'''^'^ for 20 independent data realizations (T = 50). 


Average Estimated Root Mean Square Error (20 blocks, J= 


=50,000, L = 500) 


T'i Mo 


Ml M2 M3 


A/4 


2x 0.40 (0.05) 


0.39 (0.05) 0.39 (0.03) 0.39 (0.05) 


0.39 (0.06) 


4x 0.50 (0.07) 


0.50 (0.05) 0.51 (0.05) 0.49 (0.07) 
Average Estimated BCRLB 


0.51 (0.07) 


2x 0.38 (0.02) 


0.42 (0.02) 0.36 (0.02) 0.35 (0.02) 


0.38 (0.02) 


4x 0.43 (0.03) 


0.47 (0.03) 0.41 (0.03) 0.38 (0.02) 


0.43 (0.03) 



of the estimated Root Mean Square Error for each chain. However, we also explored varying the number of 
blocks between (10, 20, 40) in the RMSE calculations which we found to have only a minor effect on the 
average MSE estimate. 

We present two tables of results, the first in Table [3] utilises data generated from model settings which 
reflect typical observation and process noise variance levels and the second in Table H] presents results in 
which the noise variance is increased by a factor of two and four. Each data set is generated from the same 
model parameters in each of the models Mq to M4 for comparison purposes. 

The results demonstrate the accuracy of our methodology since the MMSE estimates obtained from 
the simulations are not (statistically) significantly different from the estimated average BCRLB. This pro- 
vides confidence in the AdPMCMC methodology. In addition we see that increasing the noise variance of 
the observations, results in a relatively larger increase in the estimated RMSE compared to the estimated 
BCRLB. In other words as the signal-to-noise ratio decreases, the estimator accuracy also decreases, there- 
fore producing a larger difference between the average BCRLB and the estimated average RMSE. We see 
this effect in Table |31 though it is not overly pronounced. The results in most models at high noise settings 
no longer achieve the BCRLB on average, though they are still reasonably close. 



4.3. Bayes Factors Model Selection Assessment. 

The ideal case for choosing between posterior models occurs when integration of the posterior (with respect 
to the parameters) is possible. For two models M.^ and M^, parameterized by 9i;k and q;i:j respectively, the 
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posterior odds ratio of M, over Mj is given by 

piyi:T\M,)p{M,) ^ p{M,)Jpiyi.,T\9i.,k,M,)p{ei,k\M,)dei.,k ^ p{M,) ^^ 
p{y^..T\M,)p{M,) p{M,)Jp{yi.,T\ai..j,M,)p{ai..,\Mj)dai..j p{M,) '' 

which is the ratio of posterior model probabihties, having observed the data, where p{Mi) is the prior 
probabihty of model Mi and BFij is the Bayes factor. This quantity is the Bayesian version of a likelihood 
ratio test, and a value greater than one indicates that model Mi is more likely than model Mj (given the 
observed data, prior beliefs and choice of the two models). 

To demonstrate the performance of the AdPMCMC in this context we generated 10 data sets using 
the flexible- Allee model, M4 each with the same model parameters. Then we estimated each of the models 
Mq through to M^ using this data, and calculated estimates of the Bayes factors. A representative data 
realization for the parameters of M4 chosen in this simulation is presented in Figure [Tj In this example we 




Fig. 7. A representative data realization for the BF analysis generated from model M4, the Flex-Ricker model. The 
upper dashed line is the carrying capacity (K) and the lower dashed line is the Allee threshold (C). 

considered Nq ~ ln(2),65 = —0.05,66 = 0.0525,67 = —0.0025 with process and observation noise variances 
drawn randomly from the priors as detailed in Section 3.1 . These model parameters were selected to generate 
population trajectories that fluctuate between the population carrying capacity K and the Allee threshold 
C. This makes for a challenging model selection scenario and should result in ambiguity over the actual 
true model, as all models we consider can potentially capture this form of population growth behaviour in 
the presence of the high process and observation noise used here. The results in Table [5] confirm that for 
any given realization of the data from M4 with these model parameters, we see switching between the most 
plausible model to explain the particular realization. We see from the Bayes Factors that there is a strong 
ambiguity between the true model, the flexible Allee M4 used to generate the data and the Ricker model 
Ml . The reason for this is that the process and observation noise obscures the Allee effect but the signal of 
negative density dependence remains, and this is most parsimoniously represented by the Ricker model Mi. 
This synthetic example highlights the challenges faced in choosing between models, when realistic 
levels of observation and latent process noise are present in the population counts data. It also emphasises 
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Table 5. 


Bayes Factors (rounded to in 


tegers) with true model as 


Mi and (T = 50, 


U500, 


J=50l<). 


Data Set 


BFoi 


BFo2 


BFo3 


BFo4 


BFi2 


BFi3 


BFii 


BF2Z 


5^24 


BFm 
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70 









Table 6. Bayes Factors (rounded to integers) witti true model as M4 and (T = 50, L = 500, J = 
50l<). Observation noise variance and process noise variance both decreased by two orders of 
magnitude. 



Data Set 
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1 
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144 


1 














9 
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39 


1 
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1 


10 





1 








116 


2 















the importance of efficient posterior MCMC samplers for each model to ensure that the Bayes Factors are 
accurate. These results also highlight that observation noise and process noise modelling can be critical 
when determining the presence of density dependent mortality or AUee effects in real data sets. 

When the standard deviations of both the process and observation noise were decreased by an order of 
magnitude, then the model selection exercise produces Bayes Factors that identify the correct model, (Table 
[S]). In this reduced noise case, the Bayes Factors for all models (other than M4) versus model M4 were less 
than one. Table [7] explores the effect of observation and process noise on the model selection under different 
signal to noise ratios. We focus on the Bayes factors for model Mi versus M4, as these two models are most 
likely to be ambiguous in this case as they are each capable of capturing an AUee effect to varying degrees. 

The results in Table[7]demonstrate several important points relating to model selection in the presence 
of differing severities of process and observation noise. In low process error settings we see that in the majority 
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Table 7. Bayes Factor BF14 (rounded to integers) with true model as A/4 and (T 
50, L=500, J=50l<). 
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Noise Levels 
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of cases, the correct model has strong evidence for its selection. Secondly, in the case in which the observation 
noise is significantly reduced, we see a reduction in the evidence for the incorrect model Mi. However, this 
is not as marked as when the process error is decreased, indicating that perhaps the process error may have 
a stronger influence on the ability to distinguish the two closely related models. Finally, we confirm that for 
large reductions in both process and observation noise, there is clear selection of the appropriate model M4. 



4.4. Real Data Analysis 

In the following section we present analysis of two real data sets. For each data set, we extended the length 

of the AdPMCMC chains so that the mixing results in a Geweke times series diagnostic approximate Z-score 

in the interval [-2,2] for all parameters of all models. This ensures accurate estimation of the Bayes factors. 

We study the nutria data from Gosling. 1981 



lations now established around the world 



Nutria are a widespread invasive species with popu- 



m 



Woods et al.. 1992|. This data is available as data set 9833 in the 



Global Population Database NERC Centre for Population Biologv. Imperial College. 1999l |. and presents a 
time series of female nutria abundance in East Anglia at monthly intervals, obtained by retrospective census 

for a feral population. 

This data set is of interest because Drake. 2005l | fit models Mq . . . M3 without observation error, and 



found that the AIC selected the strong AUee model M3 as the best overall model whereas BIG selected the 
Ricker model Mi as the best overall model. Here we revisit this data set, include observation error, add an 
additional model (-M4) that allows for both strong and weak Alice effects, and use Bayes factors for model 
selection. 

In Figure[8]we present the two most plausible models for the nutria data, according to the Bayes factor 
analysis in Table[8]with the following settings: T = 120, L=500, J = 350,000 p ost burn-in, (burn-in = 155,000 
and annealing = 5,000). In this analysis, the default scalings proposed by Roberts and Rosenthal. 200S 
for the non-adaptive component in Algorithm 2 proved suboptimal for models A/i_3,4. We subsequently 
scaled down the non-adaptive components in these models by factors of 10~'^ to 10~^ and made a similar 
adjustment in the annealing phase. The MMSE estimates for the path space N^^'^^^, thinned by 10%, 
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are shown for models M4 and M2 , together with the unthinned Markov chain trace plots of the noise and 
process model variances. The solid lines are the MMSE of the path space and the points are observations 
with the gray shading corresponding to the posterior 95% credibility intervals. 
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Fig. 8. Nutria data analysis. Left plots - model A/4 and Right plots - model Ah- Top panel - estimates of the MMSE 
for the path space (solid lines), observations from data (points) and posterior 95% predicitive interval (shaded). Lower 
panels - trace plots of noise standard deviations for observation and state process. 



Our findings demonstrate posterior evidence for the selection of model M4, over all other models, in 
the presence of jointly estimated observation and state noise. With the selection of model M4, it is interesting 
to comment on the MMSE estimate for the model parameters from this fit. There is a small probability 
(< 0.01) that the only stable equilibrium is zero; that is, there is no stable positive equilibrium K and the 
population will go extinct. Conditional on the existence of a positive per capita population growth rate, 
that is positive for some population density, the estimate for the Allee threshold C in model M4 is -2484 
(mean) [-38814,796] (0.95 CI) relative to a carrying capacity K = 3403 [2369,4437]. This suggests that a 
weak Allee effect is present in the nutria population. 

If we ignore the new model M4 , the Bayes factors (Table H]) identify model Mi , which has no Allee 
e ffect, as the be st fitting model, over model A/3, which only has a strong Allee effect. In contrast, the results 



of 



Drake 



the best. 



20051 are am biguous, where AIC results suggest that A/3 is the best model but BIC suggests A/i is 



Drake [2005J notes that the differences in both AIC and BIC results are minor and do not provide 



compelling support for one model over the other. These results show the utility of considering models (such 
as A/4) that allow for a weak Allee effect 

The second real data set we analyse is a tim e series of a population of spa rrowhawks {Accipiter nisus) in 



south Scotla nd. This is a well studied pop ulation 



dependence 



Newton and Rotherv. 1997| that shows evidence of density 



Newton and Marquiss. 1986l |. This data set (number 6575 in the Global Population Database 



22 



NERC Centre for Population Biology. Imperial College. 1999| ) is of interest because 



3) 



Polanskv et al.. 200' 



3 



showed that the likelihood surface of the theta-logistic mod el applied to this dat a, with both process and 



Polanskv et al. 



. 2009t 



calculate the maximum 



observation noise, has strong ridges and multiple modes, 
likelihood estimates of the theta-logistic's parameters and warn that its complex likelihood surface may 
derail methods of model comparison. They calculated that the parameter controlling the form of density 
dependence (equivalent to parameter 63 in M2) equals -4.83 at the global maximum of the likelihood surface. 
A second local maximum was found when 63 = 0.04. 

The Bayes factors (Table O with algorithm settings (T = 18, L = 500, J = 150,000 post burn-in, 
burn-in = 50,000 and annealing = 5,000), however, suggest that the Ricker model Afi is the best fitting 
model.. This is equivalent to the theta-logistic M2 with b^ — 1. In our analysis the theta-logistic model is the 
worst of the five models. Note also that the AdPMCMC sampler mixes suitably well over the complicated 
multimodal posterior support (Figure [9]). Here we have again used the Gcweke diagnostic to determine the 
number of post burn-in samples, but we recognise that we may require a larger number of samples in order 
to accurately estimate the weighting or proportion of posterior mass in each of the two modes. The fact 
remains that the sampler is able to mix over the two modes, enabling model selection via Bayes factors even 
in the presence of ridges and multiple modes in the likelihood surface, assuming that the sampler mixes 
sufficiently well over the posterior support. 
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Fig. 9. Scatter plots of the posterior distributions for the static parameters 6 for A. nisus data set under model M2. This 
plot also contains the smoothed estimated marginal posterior distributions for each static parameter, followed by the 
estimated linear posterior correlation between each static parameter. 
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Table 8. Bayes Factors (rounded) for nutria. 
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Table 9. Bayes Factors (rounded) for A. 
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5. Discussion 



This pa per presents a novel technical d evelopment: a sophisti cated sampling metho dology based on Adaptive 



MCMC 



Roberts and Rosenthal. 2009l | and Particle MCMC Andrieu et al.. 2010 1 : that enables a novel ap 



plication: an efficient and robust approach to jointly estimate observation and process noise with the latent 
states and static parameters in Bayesian nonlinear state space models of population dynamics. We developed 
a novel adaptive version of the PMCMC algorithm and demonstrated its performance on several challenging 
high dimensional posterior models. These models were based on population dynamic models that are cited 
widely in the literature. We have also placed the problem of statistical inference and model selection in a 
more realistic setting by accounting for, and estimating, process error and observation noise jointly with the 
model parameters and latent population process, without the need for time-consuming tuning and algorithm 
design. We believe that the novel sampling framework presented here can be easily generalised to different 
ecological state space models. 

The AdPMCMC sampling methodology we deve loped is more sophisticat e d than the standard MCMC 



Millar and Meyer. 200C , 



Clark and Biornstad. 2004 , 



algorithms currently employed in ecological settings 
Ward. 2006l | in which the equations describing the latent population dynamics are often highly non-linear. 
In addition, there is typically a strong correlation between the model parameters and the latent process. 
These two factors will make the mixing pro perties of basic MCM C o r block-Gibbs sam pling algorithms, 
highly inefficient in a time series setting, see Andrieu et al.. 2010l | and Nevat et al.. 2010l | for discussion. 

To understand this, consider the simple sampling framework in which the posterior distribution for 
model Mi, denoted p{6 , ni-.Tlyi-.T , Mi) is sampled in the following block Metropolis-Hastings within Gibbs 
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framework, where the vector of latent states is split into k block of length kr = T, 

iteration j: @ ^ p{9\ni.,T,yi:T, Mi) 
iteration j: A^i^^- -- p{ni.,r\0, Ur+i-.r, yi-.T, Mi) 

iteration j: A^(A;-i)t+1:T ~ p{n(k-i)T+i:T\d, ni:(fc-i)T, Vv.t, M{). 

Such a block design is highly inefficient if the parameters of the posterior distribution are correlated, as 
in the case of the A. nisus data set (Figure [9]). For moderate sized values of r this sampling framework will 
mix poorly because the Metropolis-Hastings acceptance probabilities will be low irrespective of the proposals 
that are used. The simplest solution to this problem would be to sample directly from the full conditional 
distributions in a univariate Gibbs framework. However, even in the case where a single component Gibbs 
sampler is possible via inversion sampling from the full conditional cdf 's for all parameters, the Markov chain 
will mix very slowly around the support of the posterior. This is especially problematic in high dimensional 
target posterior distributions, since it requires excessively long Markov chains to achieve samples from 
the stationary regime. It can also lead to very high autocorrelations in the Markov chain states, with a 
concomitant impact on the variance of the estimators in Equation [HI To avoid slow mixing Markov chains, 
one must sample from larger blocks of parameters, - i.e. larger r. However, the design of an optimal proposal 
distribution for large blocks of parameters is very complicated. The Particle MCMC methodology steps 
around this problem by approximating the optimal proposal distribution for a large number of parameters 
via a Sequential Monte Carlo proposal distribution. 

Another positive property of the AdPMCMC methodology is evident in the Markov chain sample 
paths, shown in Figure [H As the AdPMCMC sampler mixes over different modes , the estimated adapted 
covariance in the static parameter proposal "adapts" or "learns" on-line to account for the additional mode 
and allows for more efficient mixing between the two modes of the posterior. This demonstrates the power 
of the adaptive MCMC methodology when combined within the PMCMC sampler. 

Previous meta-analyses of ecological time-series have used model selection techniques to conclude 
that Allee effects are rare without explicitly accounting fo r the confounding effects of observation error 



Saether et al.. 1996 . 



Siblv et al.. 2005 . 



Gregory et al.. 2010l | . Here we used a synthetic dataset to show that 



moderate process and observation error will lead to ambiguous model selection results in ecological time 
series that include strong Allee effects. This occurs because the noise masks the full signal of the determinstic 
density-dependent processes. Our results suggest, however, that negative density dependence is much easier 
to detect than the Allee effect. Indeed, a parsimonious model that includes negative dependence but omits 
the Allee effect will often be selected in the presence of observation error. The chance of correctly choosing 

a model with an Allee effect can be increased by decreasing the observation error. 

Nevertheless, we did find evidence for a weak Allee effect in the nutria data set studied by Drake. 2005| . 



who reported the AIC and BIG model selection in the absence of observation noise. We included observation 
noise and extended the class of models to include M4, which allows for both strong and weak Allee effects, 
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and we found posterior evidence for model M4 over all other models. Furthermore, we found that the 
second most likely model to explain the data was Mi, which lacks an Allee effect, over model M3, which 
only has a st rong Allee effect. This agrees with the BIC results in the analysis without observation noise of 



Drake. 2005 1. 



In the A. nisus data set. 



M 



Polanskv et al.. 2009l | use a numerical approach inspired by 



Kitigawa. 1987 1 



to find multiple modes with local optima that suggests a nonlinear relationship between population density 
and per capita growth r ate. The AdPM CMC sampling methodology demonstrates efficient mixing between 



these multiple modes. 



Kitigawa. IQSTj acknowledges that a grid-based approach will not scale efficiently 



with increasing dimensions in the latent path space or the static parameters; for instance, multispecies 
analyses will be precluded. On the other hand, our AdPMCMC approach will scale u p to much larger 



dimensions. If an optimization algorithm is used instead, as in [Polansky et al.. 2009l |. then it becomes 
difficult to calculate the uncertainty associated with the maximum likelihood estimates. In particular, it is 
difficult to determine the joint uncertainty of the static parameters and the latent path space estimates . In 
contrast, the AdPMCMC approach directly approximates these joint densities. 

It is important to note that model parameterisation has important implications for the choice of static 
parameter priors in the popul ation models used here. In this paper we have followed a common practice in 
the ecological literature, e.g., Clark. 2007j |. wherein parameters such as the intrinsic rate of growth r and 



carrying capacity K are combined into new parameters bi such that the model on the log scale is linear in all 
(e.g., Afo.1,4) or some (e.g., M2.3) of the parameters. The bi for these models are often nonlinear functions 
of ecological parameters. We have observed that independent Gaussian priors on the hi induce strong prior 
dependence among the ecological parameters. The AdPMCMC approach does not require linearity in the 
parameters and can fit the underlying ecological parameters directly and efficiently. In other words our 
methodology is general and can be applied to any parameterization either linear or non-linear, under any 
chosen prior structure with or without dependence. 

On the other hand, the more general hi formulation allows researchers to question how much support 
the data gives to parameter spaces consistent with the traditional ecological formulations. For example. 
Figure |9] shows that it is unlikely for both 6o;&2 > 0. This corresponds to unchecked population growth in 
the latent model; this unlikely but not impossible resu lt is precluded in the traditional formulation of the 
theta- logistic in terms of r, K, 9 [Morris and Doak. 2002l | . The more general approach is useful because there 
are many ways to parameterize a population model's structure, and different ecological considerations may 
lead to quite different constraints. For instance, previous stud ies have pr oposed the following constraint s 
in the theta- log istic model M2: 63 > 



Ross. 2006 . 



Ward. 2006| . 63 > -1 



Saether et al. 2008 



Ross. 2009t . 



sgn&3 = sgn&o Polanskv et al.. 2009l |. Figure |9] shows that these constraints may be unrealistic or unduly 
restrictive in the presence of both process error and observation noise. We argue that, wherever possible, it is 
better to estimate the probability of constraints rather than impose them, and the more general formulation 
provides a way to achieve this. Finally, we note that hyperpriors on the prior parameter values can also be 
considered in the AdPMCMC framework to address the usual concerns about prior sensitivity. 
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In our opinion the most important results of this study: a) are the confounding effects of observation 
and process errors on model sclection;and, b) the ability of the AdPMCMC to mix efhciently over the 
complex ridges and multiple modes of the likelihood surfaces associated with at least some population 
dynamic models. In particular, it is clear that the development of adaption in the MCMC proposal used 
for the static parameters clearly improves mixing. Through the use of Bayes factors we have emphasised 
how important it is to consider a range of alternative models when seeking to understand the nonlinear and 
density dependent effects that drive population dynamics. It is important to recognise, however, that our 
limited understanding of these effects, reflected in the process error of a population model, coupled with our 
imperfect measuring devices, reflected in the variance of a observation model, may hamper our ability to 
distinguish between different plausible models. It is important that ecologists recognise this, particularly if 
a "best fitting" model to make predictions of population trajectories beyond the observations. 

The AdPMCMC algorithm presented here is capable of efficient SSM inference in non-linear population 
dynamic models with complex likelihood surfaces. It thereby frees practioners from the potentially slowly 
mixing constraints of the SSM algorithms, particularly Metropolis Hastings within Gibbs, that are currently 
available. The AdPMCMC strategy can be readily generalised to other equivalent problems, and moreover, 
readily extends to include more complex SMC methods that also incorporate adaption and/or more realistic 
error variance structures. For these reasons, we believe the algorithm holds great promise in applied contexts. 
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Algorithm 1 Construction of optimal path space proposal for Model Mi given by p {ni:T\yi:T , [S] [j + 1)) 
Initialisation of SIR filter at iteration (j + 1) of the Markov chain 

1: SIR particle filter for Ni-t'- initialise L particles {[jt-i] {j + 1,*)}^^!.^ via sampling from the priors in 

Section 13.11 

2: for i = 2,...,rdo 

Perform mutation of the L particles at time t ~ 1 to obtain new particles at t via state evolution. 

3: Sample the i-th particle [nt] (j + 1, i) from particle filter proposal according to state equation given in 

relvant model Mi Section [521 Example M3 sample 

Nt ^N{2\og[Nt_,]{j,t) - log(N(j + 1,*) + [Nt^i]{j,t)) + N(J + !,«)) + 

(17) 
N{[h]{j + l,i)[Nt^i]{j,i), [a^]{j + l,i)) 



Incremental SIR importance sampling weight correction. 



4: Evaluate the unnormalised importance sampling weights, 
i-th weight given by 



Wt 



Wt 



{j + l,i), for the L particles, with the 

(j + l,i)(x [Wt-i] {j + 1, i) [wt-i] a + 1, i) 

(18) 
^[Wt-i\{j + l,i)p{yt\[nt,e]{j + l,i)), 



5: Normalise the importance sampling weights \Wt\ (j + l,i) = =rL — \~ ^, -' — r 
Evaluate the importance estimate and resample adaptively. 
6: Calculate the Effective Sample size, ESS = 



7: If the Effective Sample size is less than 80% then resample the particles at time t using stratified 
resampling based on the empirical distribution constructed from the importance weights to obtain 
new particles with equal weight. 

8: end for 

9: Evaluate marginal likelihood p{yi:T\ [d] (j + 1)) = Ilt^i ( T J2i=i i^t] {j + 1, *) 
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Algorithm 2 Adaptive MCMC for static parameters 



1: Sample a realisation ui of random variable Ui ^ [/[0, 1] 
Sample from the adaptive m,ixture proposal il3\) 
2: if ui > wi then 

Sample [6] (j + 1) from the adaptive component of the mixture proposal of il3] 
3: Estimate Sj, the empirical covariance of 0, using samples {[0]ii)}i=i:j- 
4: Sample proposal [9] {j + I) ^ N (O; [9] (j), ^^^^j) ; 
5: else 

Sample [6] {j + 1) from the non-adaptive component of the mixture proposal of (J 
6: Sample proposal [6] {j + I) ^ N (O; [6] {j), ^/d,rf 
7: end if 
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